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Abstract 

We investigate the boundary between classical and quantum computational power. This 
work consists of two parts. First we develop new classical simulation algorithms that are 
centered on sampling methods. Using these techniques we generate new classes of classically 
simulatable quantum circuits where standard techniques relying on the exact computation of 
measurement probabilities fail to provide efficient simulations. For example, we show how 
various concatenations of matchgate, Toffoli, Clifford, bounded-depth, Fourier transform and 
other circuits are classically simulatable. We also prove that sparse quantum circuits as well 
as circuits composed of CNOT and exp[i#X] gates can be simulated classically. In a second 
part, we apply our results to the simulation of quantum algorithms. It is shown that a recent 
quantum algorithm, concerned with the estimation of Potts model partition functions, can be 
simulated efficiently classically. Finally, we show that the exponential speed-ups of Simon's 
and Shor's algorithms crucially depend on the very last stage in these algorithms, dealing with 
the classical postprocessing of the measurement outcomes. Specifically, we prove that both 
algorithms would be classically simulatable if the function classically computed in this step 
had a sufficiently peaked Fourier spectrum. 

1 Introduction 

What is the power of quantum computers compared to classical ones? Understanding this funda- 
mental but difficult question is one of the great challenges in the field of quantum computation. 

A fruitful approach to tackle this problem is to study classes of quantum computations that 
do not offer any computational benefits over classical computation. Indeed, such investigations 
shed light on the essential features of quantum mechanics that are responsible for quantum com- 
putational power. At the same time, understanding which classes of quantum computations can 
be simulated classically provides useful insights in the difficult task of constructing novel quantum 
algorithms, potentially yielding indications on where to look for new algorithmic primitives. 

In recent years several non-trivial classes of quantum computations have been identified for 
which an efficient classical simulation can be achieved. For example, certain computations are 
classically simulatable due to the absence of high amounts of entanglement (quantified appropri- 
ately in terms of suitable entanglement measures) [TJ [21 13 HI [5]- Other well known results are 
the Gottesman-Knill theorem [5J [71 [HI HI HO] and the classical simulation of matchgate circuits 
[TT1 IT21 [T51 UM IT5] . The latter two classes of results provide key illustrations of the fascinating and 
puzzling relation between classical and quantum computational power, as they e.g. regard compu- 



tations that may exhibit large degrees of entanglement, interference, superposition, etc. — i.e. the 
ingredients that supposedly provide QC with its increased power — but which nevertheless cannot 
achieve any computational speed-up over classical computers. 

A common element in many existing classical simulation results and methods is the notion of 
classical simulation that is, sometimes implicitly, adopted in these works. When a quantum com- 
putation is to be simulated classically, the goal may be to either classically compute measurement 
probabilities (or expectation values) with high precision in poly-time ( "strong simulation" ) , or to 
classically sample in poly-time from the resulting output probability distribution ("weak simu- 
lation"). Given the intrinsic probabilistic nature of quantum mechanics, it is readily motivated 
that weak simulation is the more natural notion of what a classical simulation should constitute. 
Furthermore, one may easily construct examples of quantum circuit classes for which strong sim- 
ulation is intractable whereas weak simulation is achieved by elementary sampling methods (see 
e.g. |TD]) — hence showing that a gap between strong and weak simulations manifests itself already 
in elementary scenarios. The latter gap moreover highlights that any serious attempt to compare 
classical with quantum computational power should not be based on strong simulation methods. 

In spite of these basic and well-known insights, the majority of existing results on classical 
simulation of QC regard the strong variant, and weak simulation techniques seem to date largely 
unexplored. The goal of the present work is to develop new classical simulation algorithms that 
are based on sampling methods and to therewith initiate an investigation of the potential of weak 
simulation of quantum computation. Next we state more precisely the contributions of this work. 

2 Statement of results 

Classical simulation of QC with probabilistic methods 

In a first part of the paper, we develop tools to investigate weak classical simulation of QC. A central 
ingredient in our analysis will be a certain class of quantum states, called here computationally 
tractable states (CT states). Colloquially speaking, a state is CT if it is possible to classically 
simulate computational basis measurements on and if the coefficients of \ip) in this basis can 
be efficiently computed. As we will see, many important state families — matrix product states, 
stabilizer states, states generated by poly-size matchgate circuits, and several others — turn out to 
be CT. A second element will be the notion of efficiently computable sparse operators (ECS). An 
n-qubit operation is ECS if its matrix representation in the standard basis has at most poly(n) 
nonzero entries per row and per column, and if these entries can be determined efficiently. For 
example, all Pauli products, fc-local operators with k = O(logn), as well as operators that can be 
written as poly-size circuits of Toffoli gates, are ECS. We will prove the following result. 

Theorem 1 Consider a poly-size quantum circuit acting on a state \tp) and followed by measure- 
ment of an observable O. If\ip) is computationally tractable and ifU^OU is efficiently computable 
sparse, then this quantum computation can be simulated classically. 

An immediate remark to be made is that the unitary operation U itself is not required to be sparse — 
only its action on O is to yield an ECS operation, which is a significantly distinct requirement. For 
example, if U is a poly-size circuit consisting of nearest neighbor matchgates — which is generally 
not sparse at all — then W ZU is a linear combination of poly(n) Pauli products, which is an ECS 
operation. 

Theorem [T] identifies a general scenario in which quantum circuits can be simulated efficiently 
classically. This result turns out to be rather versatile and will be useful in a number of contexts. 
In this work we highlight the following particular applications (however, it is likely that this result 
has applications beyond the ones considered here): 
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Figure 1: The above concatenated quantum circuits can be efficiently simulated classically via an 
application of theorem [l] See section 5.2 for a discussion of these examples. 



• Sparse circuits. A simple instance of theorem [T] is obtained by considering a product input 
state (which is trivially CT) and the Z observable on, say, the first qubit, and by letting the 
circuit U itself be an ECS operation (in which case WZU is ECS as well). Then, by virtue 
of theorem [l] the resulting quantum computation can be simulated classically. In fact, one 
can immediately extend this result by composing m efficiently computable s-sparseQ unitary 
operations with s rn — poly(n). Then the overall circuit will still be ECS, as can easily be 
verified, and thus can be simulated classically due to theorem [T] 

Sparse unitary operations are of interest because they highlight the role of interference in 
quantum computation, as opposed to entanglement. In particular, sparse operations may 
produce highly entangled states but the interference exhibited in any sparse unitary evolution 
is always limited. As we will show, this absence of high degrees of interference can be 
exploited to construct an efficient classical simulation algorithm, in spite of the potentially 
complex entangled states produced throughout the computation. This provides (yet another) 
illustration that the presence of entanglement is by no means sufficient to guarantee quantum 
computational speed-ups. Sparse operations furthermore provide examples of a class of QCs 
where weak classical simulation is efficiently possible, whereas strong simulation is intractable 
(^P-hard). In other words, adopting the notion of weak simulation constitutes a necessary 
ingredient in the simulation of sparse circuits whereas strong simulation methods such as e.g. 
tensor contraction schemes cannot (unless #P = P) yield an efficient classical simulation. 

• Composability. Instead of letting be a simple product input state, we may also consider 
more complicated CT states which are e.g. the result of an earlier quantum computation, 
i.e. = U'\ipi n ) for some simple (e.g. standard basis) input |^i n )- As long as \ip) is CT 
and subsequently a circuit U is applied followed by measurement of O such that U^OU is 
ECS, the overall quantum circuit UU' , acting on \ipi n ) and followed by measurement of O, 
can be simulated classically by theorem [l] One hence arrives at a criterion to asses when the 
concatenation of two quantum circuits can be simulated classically. 

Since the majority of existing efficiently simulatable circuits turn out to generate CT states 
when acting on suitable inputs and as at the same time many simulatable operations yield 
ECS operations when acting on suitable observables, the above composability result is ap- 
plicable to a wide variety of settings. In particular, this result applies to Clifford operations, 
matchgate circuits, bounded-depth circuits, classical circuits, bounded-treewidth circuits, the 
quantum Fourier transform, and others. This leads to sometimes surprising examples of con- 
catenated circuits that can be simulated classically (cf. Fig. [IJ. As illustrated in these 
examples, the concatenation of simulatable blocks of very different nature may remain effi- 

x An operator is s-sparse if its standard basis matrix representation has at most s nonzero entries per row and 
per column. 



3 



o) — 
















H 




PERMUTATION 




H 




o) _ 





























Figure 2: Both the factoring algorithm and Simon's algorithm can be implemented by a circuit with 
the following structure. The first and third round in the circuit consist of collections of Hadamard 
operations applied to certain subsets of the qubits; the second round is a unitary operation that acts 
as a permutation on the computational basis. The circuit is followed by a {|0), |1)} measurement of 
a subset of the qubits. The algorithm concludes with classical postprocessing of the measurement 
results. 



ciently simulatable classically (consider e.g. the concatenation of a Clifford with a matchgate 
circuit). 

It is interesting to compare the examples in Fig. [I] to powerful quantum algorithms such 
as Simon's and Shor's. Strikingly, the latter algorithms are implemented with particularly 
simple circuitry — arguably even simpler than the classically simulatable circuits displayed in 
Fig. [I] In particular, it is known that both the factoring algorithm and Simon's algorithm 
can be efficiently implemented by a circuit with the very simple structure of Fig. [2] [HSl IT7|. 
Intriguingly, this circuit is the composition of only three blocks, each of which is elementary. 
Nevertheless, our simulation techniques cannot be successfully be applied to yield an efficient 
classical simulation of this circuit class. In the second part of this work we investigate the 
hardness of simulating these circuits and, by extension, Simon's and Shor's algorithms, in 
more detail. 

• CNOT-e iex circuits. As a further application of theorem [l] we will show that poly- 
size circuits composed of CNOT and e l9X gates, acting on product inputs and followed by 
measurement of Z on any single qubit, can be simulated classically. This result is of interest 
since it is known that CNOT together with any single real one-qubit gate V such that V 2 
is not basis-preserving, is universal for quantum computation |18j . In contrast to this, here 
it is found that there is a class of non-trivial complex gates e l0X that can be added to the 
CNOT gate while retaining efficient classical simulation. 

The above result is also interesting from a conceptual point of view. In particular, its proof 
will follow from a variant of theorem [l] where states and operations U^OU are considered 
that are CT, resp. ECS, with respect to bases other than the standard basis. Letting 
be a product input and U a poly-size circuit composed of CNOT and e l6x gates, it will 
be shown that and U* Z\U are CT, resp. ECS, with respect tho the {|±}} basis of X 
eigenstates. Hence, viewing the entire computation in this basis and applying theorem [T] 
shows that classical simulation is efficiently possible. In contrast, a direct application of 
theorem [l] i.e. with respect to the standard basis, is not possible as W Z-JJ is generally not 
ECS w.r.t this basis. 



4 



Classical simulation of quantum algorithms 

In a second part of the paper, the above results are applied in the context of quantum algorithms. 
Depending on the case at hand, the goal will be to either show that certain algorithms can be 
simulated classically or to deepen our insight into why certain algorithms achieve exponential 
(oracle) speed-ups over classical computation. We will analyze three different quantum algorithms: 

• a quantum algorithm to estimate partition functions of classical lattice models |19j ; 

• a general class of quantum algorithms containing the Deutsch-Jozsa algorithm [20] : 

• Simon's algorithm |17j . 

The first two classes of quantum algorithms will be proved to be classically simulatable using the 
methods developed in this paper. We refer to the relevant sections in the text for a discussion. For 
the time being, we limit ourselves to discussing our results in the context of Simon's algorithm, 
which we consider the most interesting application. 

Recall that in Simon's problem one has oracle access to a function / : {0, 1}" — > {0, 1}™; it is 
promised that there exists an unknown n-bit string a such that f{x) — f(y) if and only if y = x + a 
(addition modulo 2). The goal is to find a. Classically one needs at least 0(2 2") queries, whereas a 
quantum computer can solve the problem with 0{n) queries — i.e. Simon's algorithm achieves an 
exponential oracle separation between BQP and BPP. In spite of its computational power, Simon's 
algorithm is implemented with very simple circuitry, as displayed in Fig. [2j What are the essential 
ingredients responsible for the power of this algorithm? 

In standard considerations, the interplay between the Fourier transform (i.e. the second layer 
of Hadamards in Fig. [2]) and the oracle / is emphasized. After the oracle is applied, the system 
is in the state |x)|/(x)). The Fourier transform then creates interference in the system and 
"picks out" the relevant computational basis states, such that a subsequent measurement of the 
system yields the desired information about the unknown bit-string a. This rather delicate relation 
between oracle and Fourier transform is usually considered to be among the main origins of the 
hardness of classically simulating Simon's algorithm. In this work we will show that this point of 
view is not the end of the story: in particular, we will find that the interplay between the same 
Fourier transform and the function computed during the round of classical postprocessing is an 
equally important element in the speed-up achieved by the algorithm. Specifically, we will prove 
the following result. 

Theorem 2 (rough version) Consider a quantum circuit displaying the structure as depicted 
in Fig. // the function computed in the round of classical postprocessing is promised to have a 
sufficiently "peaked" Fourier spectrum, then the entire circuit can be simulated efficiently classically, 
independent of the specific forms of the other rounds. 

Thus, if the final classical round in Simon's algorithm happened to regard a function with suf- 
ficiently peaked fourier spectrum, then the entire quantum computation could be simulated ef- 
ficiently — independent of the details of e.g. the oracle / computed in an earlier stage of the 
computation, and independent of e.g. the entanglement produced by the quantum circuit. This 
result hence exposes the double role played by the Fourier transform, which is to act appropri- 
ately on both the oracle / and the function computed in the postprocessing, in order to achieve 
a quantum speed-up. These observations highlight that the power of a quantum algorithm can 
only be understood by taking the entire computation into account including the classical post- 
processing round, even though the latter may at first sight look rather innocuous. Indeed, note 
that — strikingly — in Simon's algorithm this round 'only' involves solving a simple system of linear 
equations over Z2! Nevertheless, this simple classical computation is associated with a function 
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having a very flat spectrum (as we will see), hence ensuring the exponential speed up achieved by 
Simon's algorithm. 

Remark: in the formulation of theorem 2, no knowledge of the Fourier spectrum of the function 
in question is assumed, except the promise that this spectrum is "peaked" . Using remarkable results 
of Boolean learning theory, enough information of the spectrum can be efficiently reconstructed in 
order to achieve the poly-time classical simulation as stated in the theorem. o 

Finally, also the factoring algorithm can be implemented with a circuit displaying the structure 
of Fig. [2] Therefore, the classical postprocessing plays a similar crucial role also in this algorithm. 
As the technical considerations in Simon's algorithm are more transparent than in Shor's, here we 
will focus on the former — keeping in mind that our conclusions also apply to the latter. 

Matchgate circuits and poly-time classical computation 

Somewhat unrelated to the above context, we prove a "byproduct result" that we find noteworthy. 
We will arrive at a complexity-theoretic result regarding the computational power of matchgate 
circuits. Roughly speaking, we will show the following (see theorem [4] for a precise statement): 

The class of functions that can be efficiently computed by nearest-neighbor matchgate circuits 
is strictly contained within P. 

Perhaps the most interesting aspect regarding this result here is its proof method. Surprisingly, 
the result will be obtained by combining the classical query lower bound of Simon's problem with 
our theorem [l] In particular, we will show that if the class of matchgate-computable functions 
comprised all of P, then a quantum algorithm for Simon's problem would exist which turns out to 
be efficiently simulatable classically (using theorem [T]). Hence an efficient classical algorithm would 
exist which solves Simon's problem with poly(n) classical oracle queries, yielding a contradiction. 
Remark that it is striking how utterly unrelated matchgate circuits and Simon's problem seem at 
first sight! 

Some conventions 

In this paper, when we refer to a quantum circuit, we will always implicitly mean a uniformly 
generated family of quantum circuits. Further, by observable we mean any Hermitian operator 
O with ||0|| < 1, where || • || denotes the spectral norm. When a measurement of an observable 
is considered at the end of a quantum circuit, we will always implicitly assume that this regards 
an observable that can be measured efficiently. The notion of 'simulation' will be synonymous to 
'classical simulation'. The notion 'efficient' will be synonymous to 'in polynomial time'. For clarity, 
all results are stated in terms of qubit systems, but generalizations to arbitrary finite-dimensional 
quantum systems are immediate. Our standard notation for the computational basis of an n-qubit 
system will be {|a;)}, where x — {x\, ...,i„) ranges over all n-bit strings and \x) — \xi) ® . . .(§5 \x n ). 

3 Classical simulation of quantum computation 

In this section we discuss the definition of classical simulation that will be adopted in the present 
work. Suppose that an n-qubit poly-size quantum circuit produces an output state |Vw) and 
is followed by a measurement of an observable O, assuming that O can efficiently be measured. 
Then, repeating the computation K = poly(n) times, recording the measurement outcome Oj in 
each run (i.e. each oi is one of the eigenvalues of O) one obtains an estimate a = K^ 1 YU=i 0i °^ 
the expectation value (O) = (V>out|0|^ ou t) . The accuracy of this approximation is dictated by the 
Chernoff-Hoeffding bound (we refer to the Appendix for a statement and discussion of this bound). 
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In particular, this bound implies the following: for every e = where p(n) represents an 

arbitrary polynomial in n, there exists a K that scales as a suitable polynomial in n such that the 
inequality \a — (0)\ < e holds with a probability that is exponentially (in n) close to 1. In other 
words, by taking poly(n) runs of the computation — and this is all that is allowed in an efficient 
quantum computation — it is possible to estimate (O) with an error that scales as an arbitrary 
inverse polynomial in n. We denote this type of estimate as an approximation with 'polynomial 
accuracy' or a 'polynomial approximation'. Note that a polynomial approximation achieves an 
estimate of (O) up to O(logn) significant bits. 

The above method hence represents an efficient quantum algorithm to estimate (O) with poly- 
nomial accuracy with a success probability that lies exponentially close to 1. We now say that 
this quantum algorithm can be efficiently simulated classically if there exists an efficient classical 
algorithm to provide a polynomial approximation of (O), again with a probability that lies ex- 
ponentially close to 1. That is, we require the classical simulation algorithm to approximate (O) 
in poly-time with the same accuracy that is achieved by the quantum algorithm. This notion of 
simulation is sometimes called weak simulation. The latter is to be regarded as opposed to the 
much more stringent requirement of strong simulation, where it is asked to construct a classical 
algorithm to approximate (O) in poly(m, n) time up to m significant bits (i.e. with exponential 
precision) . 

Note that the notion of weak simulation is more true to the concept of what a classical simulation 
actually constitutes since, colloquially speaking, it requires the classical simulation to achieve 
'the same result' as the quantum algorithm. In contrast, in the strong scenario one is asked 
to construct an efficient classical algorithm that approximates (O) far more accurate than the 
quantum algorithm itself could generally achieve in polynomial time. Even though it has been 
realized previously that the weak variant is a valid and natural notion of classical simulation of 
QC (see e.g. dJ[H]), it seems that this notion is to date largely unexplored. In particular, the 
vast majority of classical simulation results use the strong variant. In |10j it was pointed out that 
there exists simple examples of quantum circuits for which weak classical simulation is possible 
with elementary methods, whereas strong simulation of the same circuits is a #P-hard problem 
and hence intractable. This highlights the presence of a significant gap between strong and weak 
simulation. 

Remark: When the notion of polynomial approximation is used in the following, we will always 
mean a polynomial approximation which is achieved with a probability that is exponentially close 
to one. o 

4 Computationally tractable states 

The objective of this section is to develop the notion of computationally tractable (CT) states and 
to prove theorem [T] To do this, first we first define CT states and discuss some of their elementary 
properties; this is done in section |4~Tj In section [4~2| we consider basis-preserving operations, which 
are identified as a class of operations that map CT states to CT states. In section |4~3| we consider 
sparse operations; the main technical contribution in this section is theorem[3]regarding the efficient 
classical estimation of matrix elements (</?|A|t/>), where and \ip) are computationally tractable 
and A is an (efficiently computable) sparse operation. This theorem will immediately lead to the 
proof of theorem [TJ 

4.1 Definition of CT states 

Throughout this paper, we will deal with n-qubit state families {IV'n) : ft. = 1, 2, . . .}, where \ip n ) 
is an n-qubit state. When considering such a state family {|^> n }}) we wm mostly refer to a single 
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state \ip n ) = W) with the silent assumption that this actually denotes a family. We now consider 
the following definition. 

Definition 1 An n-qubit state \ip) is called 'computationally tractable' (CT) if the following con- 
ditions hold: 

(a) it is possible to sample in poly(n) time with classical means from the probability distribution 
Prob{x) = \(x\^)\ 2 on the set of n-bit strings x, and 

(b) upon input of any bit string x, the coefficient (x\ip) can be computed in poly(n) time on a 
classical computer. 

For convenience, in (b) we require the coefficients {x\ij)) to be computable with perfect precision, a 
notion which may lead to rather pathological situations when e.g. irrational numbers are involved. 
The results in this paper can however straightforwardly be generalized to the case where (x\tp) 
can be computed efficiently with exponential precision, i.e. up to m significant bits in poly(n, m) 
time. As in the present work the distinction between these two types of accuracies is not essential 
(in contrast to the distinction between polynomial and exponential precision, which is crucial), for 
clarity we state all results w.r.t. the notion of perfect accuracy. Also in other places in the text 
where we refer to 'perfect accuracy', the results in question immediately generalize to the case of 
exponential precision. 

Note that (a) and (b) are highly dependent on the classical description of the state \ip) that is 
provided. Therefore, strictly speaking it would be more precise to call a state \ip) CT relative to 
this classical description. In this paper we will only encounter situations where each state has a 
natural (efficient) description that will be obvious from the context. It will always be assumed that 
this particular description is provided. For example, the classical description of a state generated 
by a poly-size quantum circuit acting on, say, the all-zeroes input, will always be assumed to 
be the circuit that generates the state. As another example, for every complete product state 
\ip) — \ipx) ® • • • ® IVvJ we will assume \ifi) to be specified in terms of the 'obvious' description of 
IV') consisting if the 2n complex coefficients (0\ipi) and (l\ipi). 

Even though conditions (a) and (b) are similar in nature, we provide evidence that these 
conditions are incomparable. In particular, the following complexity theoretic argument implies 
that it is highly likely that there exists states satisfying (b) but not (a). Consider any efficiently 
computable function / : {0, 1}™ — > {0, 1} for which it is promised that there exists a unique Xq 
such that f(xo) — 1, and define the n-qubit state \ip) = J2 X f( x )\ x ) = \ x o)- Note that the state 
\ijj) satisfies condition (b). Assuming that (b) implies (a), it follows that it is possible to efficiently 
sample from the distribution {| (x\ip) | 2 }. But this distribution assigns a zero probability to each bit 
string x except Xq, which has unit probability. Hence, the possibility of efficiently sampling from 
this distribution implies that Xq can be determined efficiently. Regarding / as a verifier circuit for 
an NP problem, it would immediately follow that every problem in NP with a unique witness is in 
P. This last property is not likely to be true pT] , 

Next we state a useful sufficient (but not necessary) criterion to assess whether condition (a) 
holds for a given state. To state this result, we need the following notation. For an n-qubit state 
let Ps,y{\i > )) = Ps,y denote the probability of obtaining the bit string y = (j/j : i 6 S) as an 
outcome when measuring the qubits in the set S C {I, . . . , n}. We can then state the following 
lemma; a proof can be found in e.g. |Ilj . 

Lemma 1 Let \if>) be an n-qubit state. Suppose that, on input of an arbitrary S and y, the 
probability p$, y can be computed in poly(n) time. Then it is possible to sample in poly(n) time from 
the probability distribution {\ (x\ip) | 2 }. 

Several important state families turn out to be computationally tractable, as illustrated next. 
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• Examples of computationally tractable states: 



- Product states are trivially CT. 

- Every state of the form \ip) oc ^2 X e l9tyX ^\x), where the sum is over all n-bit strings x and 
where x — > 9{x) € M. represents an arbitrary efficiently computable function, is trivially CT. 
Every state obtained by applying a poly-size circuit family consisting of Toffoli gates to an 
arbitrary product state is computationally tractable as well, as can easily be proved (this 
property will also follow from lemma |2j). 

- Every matrix product state (MPS) of polynomial bond dimension is CT. A state is an 
MPS of poly bond dimension if there exist 2n N x N matrices Aj[0], with N = poly(n) 
such that (x\ip) = Tr(^4i [xt] . . . A n [i„]), for every n-bit string x = (xi, . . . , x n ). Property (b) 
follows immediately from this definition. Property (a) holds since the conditions of lemma 
[l] are satisfied for all MPS of polynomial bond dimension [35]. Tree tensor states [33J are 
generalizations of MPS with similar properties and are also computationally tractable. 

- A Clifford circuit is a quantum circuit composed of Hadamard, CNOT and PHASE gates, 
where PHASE = diag(l,i). An n-qubit stabilizer state is any state that is generated by 
applying a poly-size Clifford circuit to the state |0)™. Every stabilizer state is a CT state. 
Property (a) is the content of the Gottesman-Knill theorem [6] . Property (b) is proved in [7] 
(see also [TU]). 

- A (unitary, two-qubit) matchgate G is any two-qubit gate of the form 

a b 

n _ u v 

x y 
c d 

where A,B £ SU(2). Every state obtained by applying a poly-size matchgate circuit to a 
computational basis state, where all gates are restricted to act on nearest neighbors (assuming 
a one-dimensional ordering of the qubits) is a computationally tractable state. Properties 
(a) and (b) are proved in [TT] . 

- Any n-qubit state that is obtained by applying the quantum Fourier transform (over the 
integers modulo 2") to an arbitrary product state, is a CT state. See e.g. [23] for a simple 
proof of this property (see also [25j [26] for related results) . 

- We briefly mention a general class of classical simulation results related to efficient tensor 
contraction schemes. This approach relies on the topology of (a graph associated with) 
the quantum circuit in question. If this topology displays a sufficiently tree-like structure 
(quantified in terms of the graph invariant tree-width) then classical simulation of such circuits 
can be achieved [27) . It can be shown that the output states of quantum circuits with 
logarithmically scaling tree- width (acting on product input states), are CT states; the proof 
essentially contained in [27] and is omitted here (see also [4] for related work). 
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4.2 Basis-preserving operations 

Next we investigate which operations map the family of CT states to itself. In this context, the 
operations that preserve the computational basis play an important role. An n-qubit operation M 
is called 'basis-preserving' if every computational basis state |x) is mapped to M\x) = r ) x \'K{x)) 1 for 
some permutation 7r of the set of n-bit strings and some complex j x . The operation M is efficiently 
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computable if the functions x — > j x , x — > ir(x) and x — > tt 1 (x) can be evaluated in poly(n) time. 
For example, every Pauli product [28] is efficiently computable basis-preserving, as well as every 
operation of the form O = J2 x (~ l ) fix) \ x )( x \^ where / : {0,1}" {0, 1} is an efficiently computable 
function. Also every poly-size circuit composed of elementary basis-preserving gates (e.g. Toffoli 
gates, diagonal gates) is efficiently computable basis-preserving. 

The relevance of efficiently computable basis-preserving unitary operations in the present con- 
text is that these operations preserve the class of CT states: 

Lemma 2 If\tp) is a computationally tractable n-qubit state and if M is an efficiently computable 
unitary basis-preserving operation, then = M\ip) is again computationally tractable. 

Proof: Let the permutation tt and the coefficients j x be defined as above. Note that = 1 
for every x since M is unitary. The coefficients of |?//) are given by (x\ip') = l- n - 1 (x){' K ~ 1 { x )\' , P) ■ 
Property (b) now follows immediately from the properties that M is efficiently computable and 
that |^) is CT. To show (a), we have to find an efficient classical method to sample from the 
probability distribution defined by Prob(a;) = |(x|-0')| 2 = l( 7r ~ 1 ( :E )l' ( /')| 2 - To do so, consider the 
following procedure. First sample from the distribution {|(y|'0)| 2 }! yielding a bit string y with 
probability Kj/IV')! 2 , and subsequently output the bit string x := n(y). This procedure is efficient 
since is CT and y — > 7r(y) is efficiently computable. Moreover, every bit string x is generated 
with probability \(ir~ 1 (x)\ij))\ 2 as desired. □ 

Note that the basis-preserving operation M may drastically change the entanglement prop- 
erties of Consider e.g. the case where \ip) is a complete product state and M a poly-size 
circuit of CPHASE and/or Toffoli operations, yielding a state \ip') that may be highly entangled. 
Nevertheless, both \ip) and \ip'} are CT and equal up to a basis-preserving operation. 

4.3 Sparse operations 

Next we consider sparse operations. Such operations are sufficiently close to basis-preserving 
operations that their action on CT states remains manageable. An n-qubit operation A is s-sparse 
if for every basis state \x), each of the vectors A\x) and A T \x) is a linear combination of at most 
s computational basis states. The quantity s is called the sparseness of A. We will consider n- 
qubit operations A (both unitary operations and observables) with sparseness s < poly(n), which 
will simply be called 'sparse operations'. Note that the notion of sparseness is defined w.r.t. to 
the number of nonzero entries per row/column and not the total number of nonzero entries in 
the matrix, the latter not being required to be small. In particular, a sparse n-qubit operation 
generically has a total number of nonzero entries that scales exponentially with n. 

For every s-sparse n-qubit operation A, define 2s functions a.; : {0, 1}" —> C and ri : {0, 1}™ — > 
{0, 1}™ (i = 1, . . . , s) as follows: the n-bit string r^x) is defined to be the row index of A associated 
with the i-th non-zero entry in the column indexed by x (when traversing this column from top to 
bottom), if an i-th nonzero entry exists within this column; we denote this entry by oti{x). If an 
z-th nonzero entry does not exist in this column, then Ti{x) is set to be the all-zeroes string and 
cti(x) is set to zero. With the above definitions, one simply has 

A\x) = a x (a;)|ri(af)) + . . . + a,(x)|r a (x)). (2) 

Similar definitions can be given regarding the rows of A, leading to 2s functions ft : {0, 1}™ — > C 
and Ci : {0,1}" —> {0,1}" (i = l,...,s) that are the natural counterparts of the a* and r,, 
respectively. 

A sparse n-qubit operation A is efficiently column-computable if, on input of an arbitrary n-bit 
string x, it is possible to list the (at most s = poly(n)) nonzero entries within the column of A 
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indexed by x together with the row indices associated with each of these non-zero entries, all in 
poly(n) time. Equivalently, A is efficiently column-computable if it is possible to compute the 
2s quantities ai(x) and r,(a;) (i = 1, ...,s) in poly-time. The operation A is called efficiently 
row-computable if A T is efficiently column-computable. Finally, A is called efficiently computable 
if it is both efficiently row- and column-computable. All efficiently computable sparse unitary 
operations can be implemented efficiently on a quantum computer [29 . In this paper we will only 
consider sparse operations that are efficiently computable. 

The following are some examples of efficiently computable sparse operations. 

• Examples of efficiently computable sparse (ECS) operations: 

- Every efficiently computable basis-preserving operation is ECS. 

- Every d-qubit gate G acting within an n-qubit circuit, represented by the matrix G ® I 
where / denotes the identity acting onn-d qubits, is 2 d -sparse. If d — O(logn) then such 
an operation is ECS. 

- Every operation that is a linear combination of poly(n) ECS operations, is ECS. It follows that 
every operator H = J^ILi Hi which is a sum of rn = poly(n) <i-local observables Oi (with 
d = O(logn)) is ECS. This means that observables such as Hamiltonians and correlation 
operators are typically ECS. 

- Let U represent an n-qubit poly-size circuit of basis-preserving elementary gates (e.g. Toffoli, 
CNOT, PHASE, CPHASE, etc.), interspersed with k gates V\, . . . , 14 at arbitrary places in 
the circuit, each of which acts on at most d qubits. It is required that kd = O(logn); 
otherwise the Vi are arbitrary. Then U is ECS. To see this, expand each gate Vi as a linear 
combination of A d Pauli products and note that every Pauli product is efficiently computable 
basis-preserving. Consequently, U can be written as a linear combination of A dk — poly(n) 
efficiently computable basis-preserving operations, showing that U is ECS. 

- ECS operations often arise in the context of quantum algorithms, related e.g. to unitary 
group representations; see e.g. |29j and references within. 

We are now in a position to state the following result, which constitutes the main technical 
ingredient in this work regarding the use of sampling techniques in classical simulation. 

Theorem 3 Let and \ip) be CT n-qubit states and let A be an efficiently computable sparse 
(not necessarily unitary) n-qubit operation with \\A\\ < 1. Then there exists an efficient classical 
algorithm to approximate {(p\A\ij)) with polynomial accuracy. 

Note that theorem [l] immediately follows from theorem [3J Before proving this result in its most 
general form, as a warm-up we prove a special instance, taking A to be the identity. Hence, we 
are concerned with the estimation of overlaps between CT states. This special case is proved 
beforehand to illustrate the sampling methods used in this work, without the more technically 
involved arguments required in the proof of theorem [3] Thus, we set out to prove the following 
property, formulated in terms of a lemma. 

Lemma 3 Let |?/>) and \ip) be two CT n-qubit states. Then there exists an efficient classical 
algorithm to approximate ((p\ip) with polynomial accuracy. 

Proof: Denote p x :— \(x\ili)\ 2 and q x := | (a; | c/?) | 2 . Since \ip) and \ip) are CT states, it is possible 
to sample efficiently from the probability distributions {p x } and {q x }- Define the function 5 : 
{0, 1}™ — > {0, 1} by 8(x) = 1 if p x > q x and S(x) = otherwise, for every rt-bit string x, and define 
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e = 1 — S. Then 5 and e can be evaluated efficiently since p x and q x can be efficiently evaluated by 
assumption (b) in the definition of CT states. The overlap (tp\ip) is therefore equal to 

= ^2(ip\x)(x\^)5(x) + J2(<P\x)(x\i>)€(x), (3) 
where the sums are over all n-bit strings x. Defining the functions F and G by 

= ( V \x)(z\i,) = mm 

Px Qx 

we have (v?|V') = (F) + (G) where (F) — ^p x F(x) and (G) = QxG(x). It follows from assump- 
tion (b) in the definition of CT states that F and G can be efficiently evaluated. Furthermore, 
both and |G(x)| are not greater than 1. It thus follows from the Chernoff-Hocffding bound 

that both (F) and (G) can be approximated efficiently with polynomial accuracy. This implies 
that (<p\ip) can be estimated with polynomial accuracy as well. This completes the proof. □ 

Lemma [3] shows that the overlap (ip\rp), representing a 'joint' property of the states l^) and 
| ip), may be estimated efficiently classically even when only an efficient simulation of quantum 
processes resulting in \ip) and \ip) individually is available — in particular, the techniques leading 
to the proofs of (a)-(b) (cf. definition of CT states) for \ip) and \<p), may be completely different. 
For example, the overlap between a matrix product state and a stabilizer state can be estimated 
efficiently classically with polynomial accuracy, even though such states are CT due to very different 
argumentations. 

We are now in a position to prove theorem [3] 

Proof of theorem [3f It is sufficient to prove the result for CT states and \ip). Let s = 
poly(n) denote the sparseness of A. Using the notation of ([2]), we have (< J 9|A|?/>) = Yn=i wnere 
we denote 

a i :='J2ai(x)(ip\r i (x)){x\ip). (5) 

X 

Note that |«j(a;)| < 1. It is sufficient to prove that each of the s quantities Oi can be estimated 
efficiently with polynomial accuracy, for then also 53i=i a i can be estimated with polynomial 
accuracy as s — poly(n). To do so, write p x := |(x|?/>)| 2 and q x :— |(x|^)| 2 . Define a function <5j 
by §i(x) = 1 if p x > q ri t x ) an d 8i(x) — otherwise, for every n-bit string x, and define £j = 1 — #j. 
Then Si and can be evaluated efficiently since \tp) and \ip) are CT and A is ECS. We split Cj in 
two parts by inserting Si(x) + ti(x) = 1: 

Oi = ^(ip\r i (x))(x\ip)a l (x)S i (x) +'^2(ip\r i (x))(x\tp)a i (x)e i (x). (6) 

The function Fi defined by 

Ci{x) = a l {x)d l [x) (7) 

Px 

is efficiently computable and satisfies (^(a;)! < 1 for every x. The first term in the r.h.s. of ^ is 
hence equal to (Fi) = ^2p x Fi(x), which can be estimated to polynomial accuracy efficiently due 
to the Chernoff-Hoeffding bound. To estimate the second term in the r.h.s. of ([6]), one needs to 
be careful since the function 7^ may not be injective. We proceed as follows. Define the following 
function Gt\ 



G Av) = 2^ ai{x)ei{x) (8) 

x: n(x)—y and oti(x)^0 ^ 
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with the additional convention that Gi(y) is zero if there are no x such that rj(x) = y and ai(x) =/= 0. 
With this definition, the second term in the r.h.s. of (l6|) is equal to (Gj) = q y Gi(y). We now 
make the following claims. Claim 1: the function Gi is efficiently computable; and Claim 2: 
\Gi{y)\ < s for every y. A proof of claims 1 and 2 implies that (Gi) can be estimated in poly-time 
with polynomial accuracy due to the Chernoff-Hoeffding bound. But then also Oi can be estimated 
efficiently, thus completing the proof. 

We now prove Claim 1. Since A is s-sparse, every row y has at most s non-zero entries. 
Equivalently, the following set contains at most s strings x: 

{x : 3j G {l,...,s} s.t. y — rj(x) and aj(x) ^ 0}. (9) 

Hence, a fortiori, for every fixed i there are at most s different x such that r^x) = y and cti{x) 
0. Moreover, given an arbitrary y it is possible to efficiently determine all these x's and the 
corresponding coefficients at(x). This is done in two steps: first, since A is efficiently (row- 
computable, given a row index y it is possible to compute all (at most s) strings x in the set ^ m 
poly-time; second, for all those x one computes ri{x) and a.i(x) — this is possible in poly-time since 
A is efficiently column-computable — and verifies whether r,(a;) is equal to y\ those x for which 
ri(x) = y are kept, the others discarded. 

It follows that Gi(y) is a sum of at most s = poly(n) terms, each of which is efficiently com- 
putable. Thus, Claim 1 is proved. Moreover, Claim 2 now immediately follows as well, since the 
modulus of every term in the sum Q is smaller than one and there are at most s terms in the 
sum. This proves theorem [3j □ 

Remark: poly-ECS operations. — In the definition of ECS operations and in the subsequent 
statement of theorem[3j we have required that the non-zero entries of A can be computed efficiently 
with perfect precision. Theorem [3] also holds for sparse operations where, instead, these coefficients 
can be estimated efficiently with polynomial accuracy, which is a significant relaxation. Call an 
n-qubit operation A (\\A\\ < 1) poly-ECS if it is sparse, and if (i) on input of an arbitrary column 
index x, it is possible to determine in poly-time all those row indices y such that (y|yl|a;) ^ and if 
the corresponding nonzero entries (y\ A\x) can be estimated in poly-time with polynomial accuracy, 
and (ii) similarly for the row indices y. Theorem [3] then also holds for poly-ECS operations. The 
proof is completely analogous to the above proof of theorem [3j The only difference is that now the 
functions Fi (x) and Gi (x) can no longer be computed exactly, but only with polynomial accuracy. 
However, this suffices to invoke the Chernoff-Hoeffding bound (cf. the Appendix). This remark 
will play an important role in the discussion of Simon's algorithm i.e. in the proof of theorem [2]o 

We conclude this section with two corollaries of theorem [3j Corollary [T] shows that expectation 
values of local observables can be estimated efficiently classically for every CT state. This result 
may potentially be of use in e.g. variational Monte Carlo studies of strongly correlated systems 
(this is work in progress). Corollary [2] will be of use when we discuss the Deutsch-Jozsa algorithm 
in section HT2l 

Corollary 1 Let be an n-qubit CT state and letO be ad-local observable withd = O(logn) and 
\\0\\ < 1. Then there exists an efficient classical algorithm to estimate {%p\0\tp) with polynomial 
accuracy. 

Proof: this result follows immediately from theorem [3] since every d-local O with d = O(logn) 
is ECS. Here we provide a short alternative proof that does not require the formalism used in 
the proof of theorem [3l Every observable O of the form considered can be written as a linear 

I— —I ^^y- 

combination of N = poly(n) Pauli operators: O = $J i=1 Oi-Pj, with \ai\ < 1. Consequently, 

(O) := MOM =J>Mi*M. (10) 
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As each P; is an efficiently computable basis-preserving unitary operation, each state Pi\ip) is CT 
due to lemma[2] Invoking lemma[3j the overlap between Pi\tp) and \ip) can be estimated classically 
with polynomial accuracy. Hence, also (O) can be estimated classically with polynomial accuracy. 
This proves the result. □ 



Corollary 2 Let \tp) and \ip) be CT n-qubit states, let |£) and |x) be CT k-qubit states (with k < n) 
and let A and B be efficiently computable sparse n-qubit operations with \\A\\, \\B\\ < 1. Then there 
exists an efficient classical algorithm to approximate ((p\A[\() (x\®I]B\ip) with polynomial accuracy. 

Proof: The proof uses a technique related to the SWAP test. Denote := B\ip) and \tp') := A^\tf) 
(which are potentially unnormalized states) and consider the following identity: 

(vm{x\®iw) = [{x\{v'\\u S wAA\omi (ii) 

where the unitary operator [/swap swaps qubit i with qubit i + k, for every i = 1, . . . , k. The 



identity (111 can easily be verified. Hence, we have 

( (p \A[\0{x\®r}B\i>) = KxKvW ® A]u SWAP [i ® B]\\o\i))}. (12) 

Note that the (k + n)-qubit states |£)|^>) and |x)lv) are CT. Moreover, it can easily be verified 
that Inswap is ECS. This implies that the operation [/ ® A]Uswap[I <8> B] is ECS as well, being a 
product of three ECS operations. Theorem [3] can now be applied. □ 

Note that, as a special case of this last result, it follows that partial overlaps (y|[|£)(xl ® -TH^} 
between CT states can be estimated efficiently classically. 



5 Applications of theorem [T] 

Next we discuss three applications of theorem [T] as announced in the introduction. These applica- 
tions regard sparse circuits, composability, and CNOT-e l6X circuits. 



5.1 Classical simulation of sparse circuits 

The following is a formal statement of the classical simulation of sparse circuits which was an- 
nounced in the introduction. 

Corollary 3 Let U be a circuit composed of m efficiently computable s-sparse unitary operations 
with s m — poly(n). The circuit acts on an arbitrary product input state and is followed by a Z 
measurement of say, the first qubit. Then this quantum computation can be simulated efficiently 
classically. 

Proof: Let \4>) denote the product input state and let Z\ denote the Z observable acting on the 
first qubit. The expectation value of Z-y is given by {Zf) — (tp\W ZU\tp) . Note that U is ECS due 
to the restrictions on s and to; but then also the observable O := W ZU is ECS, being a product 
of three ECS operations. Moreover, is a product state and hence CT. Theorem [I] can now be 
applied. □ 

As briefly alluded to in the introduction, sparse operations highlight the role of interference — as 
opposed to entanglement — in quantum computation. Note that sparse operations may generically 
produce highly entangled states. Consider e.g. the simple case where the input is |+)" and the 
entire circuit U is composed of poly(n) CPHASE gates (which are basis-preserving gates and thus 
particularly simple examples of sparse operations). With such circuits, it is possible to efficiently 
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generate e.g. the highly entangled cluster states [30J. On the other hand, if a sparse operation U 
acts on a state then each coefficient of U\ip) in the standard basis is a linear combination of 
at most poly(n) coefficients of Hence, the "interference" in the process \ip) — > U\tp) is limited 
(we use the notion of interference in a colloquial sense and do not adopt any technical definition). 
Corollary [3] states that quantum computational processes where the interference is "small" in this 
sense, cannot offer any speed-up compared to classical computers, in spite of the high degrees 
of entanglement that may be generated throughout the computation. Corollary [3] may thus be 
regarded as complementary to a class of results stating that quantum computations that generate 
low amounts of entanglement (quantified appropriately) can be classically simulated efficiently (see 

e.g. [UH! HE]). 

Finally, note that in corollary [3] one cannot hope for an improvement of the bound s m = poly(n) 
to e.g. m = poly(n) and s constant (unless BQP = BPP) since every poly-size quantum circuit is 
a product of m — poly(n) single- and two-qubit gates, each of which is an s-sparse operation with 
s constant. 

5.2 Composability 

Theorem [l] immediately leads to a criterion to assess when the composition of two quantum circuits 
can be simulated classically. Formally, we have: 

Corollary 4 Consider poly-size n-qubit quantum circuits U\ and U%, an input state \ipi n ) and an 
observable O such that: (i) the state i7i|^>j n ) is computationally tractable and (ii) the operation 
\j\OU2 is efficiently computable sparse. Then the circuit U = U2U1, acting on \ipi n ) and followed 
by measurement of O, can be simulated efficiently classically. 

Next we provide some illustrations of this result. First we provide some examples of pairs (£7, O) 
such that U^OU is ECS. All circuit families U below are poly-size. 

• Examples of pairs (U, O) where U^OU is ECS: 

- Let U be a circuit of constant depth and let the observable O act nontrivially on O(logn) 
qubits. Then WOU also acts nontrivially on O(logn) qubits and is hence an ECS observable. 

- Let U represent a Clifford circuit and let O be any observable that is a linear combination 
of N = poly(rt) Pauli products: O — J2iLi a i-P l with | cti | < 1 and P % Pauli operators. Then 
WOU is again a linear combination of N Pauli products, and hence ECS. 

- Let U be a circuit composed of nearest-neighbor matchgates and let Z 1 denote the Pauli 
Z operation acting on the first qubit. Then U maps Z\ (under conjugation) to a linear 
combination of poly(n) Pauli products (see e.g. fH]), which is an ECS operation. 

Next we explicitly describe two concatenated circuits that can be simulated efficiently using 
our results; see also Fig[T] In both examples, the circuit acts on the all-zeroes computational basis 
state and is followed by measurement of Z on the first qubit. 

• Examples of corollary [4j 

- Consider a quantum circuit V = V4V3V2V1 where V\ is an arbitrary local unitary operation, V2 
represents the quantum Fourier transform (over Z2«), V3 is an arbitrary efficiently computable 
sparse unitary, and V4 is an arbitrary poly-size (nearest- neighbor) matchgate circuit. Then 
this circuit can be simulated efficiently classically due to corollary [4j In particular, we show 
that corollary [4] can be applied by taking U\ = V2V1 and C/ 2 = V4V3. To see this, note first 
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that V2V1 acting on the input yields a CT state. Further, (V4V3) 1 ' Z '(V4V3) is ECS: indeed, 
VlZVi is a sum of poly(ra) Pauli products and hence ECS, and thus (V^)^ Z (V4V3) is ECS 
as well, being a product of three ECS operations. Corollary [4] can now be applied. 

- Consider a quantum circuit V = V4V3V2V1 where V\ is an arbitrary poly-size matchgate 
circuit, V 2 is a poly-size circuit of Toffoli gates, V3 is an arbitrary poly-size Clifford circuit 
and V4 is an arbitrary log-depth circuit consisting of nearest-neighbor gates. We show that 
corollary [4] can be applied by taking U\ = V\ and U2 = V4V3V2. To see this, note first that 
V x acting on the input yields a CT state. Further, (F 4 V 3 V^Z{V i V z V 2 ) is ECS: V}ZV 4 acts 
nontrivially on O(logn) qubits and is hence is a linear combination of poly(n) Pauli products; 
but then also V 3 W^ZV 4 V 3 is a linear combination of poly(n) Pauli products (and hence ECS) 
since V3 is a Clifford operation; finally, it follows that (V4V3V2) 1 ' Z (V4V3V2) is ECS as this 
operation is a product of three ECS operations. Corollary [4] thus again yields the desired 
result. 

Several other examples of the above nature can easily be generated. 
5.3 Rotated bases and CNOT-e ifll circuits 

In our definition of computationally tractable states and sparse operations, as well as in the re- 
sulting theorem [TJ we have singled out a particular basis — i.e. the computational basis. Note, 
however, that in the vast majority of all arguments we have never relied on the specific form of this 
basis. Therefore, we may consider a generalized definition of CT states, sparse operations, etc., 
stated relative to a arbitrary basis B, and carry out an analogous program as done so far, leading a 
much broader class of results. Results such as theorem [l] can be transferred in an obvious way, and 
will be omitted. Here we limit ourselves to discussing an example that can be understood using 
this generalized notion of CT states. This example regards the simulation of circuits composed of 
CNOTs and e l0X gates. Other examples of similar nature can easily be constructed. 

Let B = {\b x )} denote the |±) product basis, defined by \b x ) cx (g)™ =1 [|0) + (-l) x *|l)] for 
every n-bit string x = {x\, . . . 7 x n ). A state is called 'computationally tractable in the basis £T 
if it is possible to sample in poly(n) time with classical means from the probability distribution 
Prob(x) = K&zli/')! 2 , and if the coefficients (b x \ip) can be computed in poly(n) time classically. It 
is clear that \ip) is CT in B iff H® n \ijj) is CT in the computational basis. For example, it can easily 
be shown that every stabilizer state, as well as any MPS is computationally tractable in the 
|±)-basis B as if® n |?/>) is in both cases CT in the computational basis. 

Similarly, the notion of ECS operations w.r.t. B is defined in the natural way. Obviously, A 
is ECS w.r.t. B iff H® n AH® n is ECS in the computational basis. For example, let U denote an 
arbitrary poly-size n-qubit circuit composed of CNOT and e gates, where 9 may be any (real) 
angle. Whereas U is generally not ECS in the computational basis, this circuit is always ECS in 
the |±) basis B. This can be seen as follows. Let CNOT a {, denote a CNOT gate with control a 
and target b. One then has the pair of identities 

H® 2 CNOT ab H® 2 = CNOT 5a and He l9X H = e iez , (13) 

both of which are easily verified. These identities imply that M := H® n U 'H® n is a poly-size 
circuit consisting entirely of CNOT and e %BZ gates and is thus ECS (even basis-preserving) in the 
computational basis. This shows that U is ECS in the |±) product basis. 

One can now consider a generalized form of theorem [TJ now stated relative to the |±) basis (or 
any other basis): 

Theorem 1' Let |?/>i„) be an n-qubit state, let U denote a poly-size n-qubit circuit and let O 
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denote an observable. If \tp) is CT in B and ifU'OU is ECS in B, then the circuit U, acting 
on \ipi n ) and followed by measurement of O , can be simulated efficiently classically. 

Now consider a CNOT-e t6X circuit U as above. The circuit U acts on an arbitrary product input 
|a) and is followed by measurement of Z\. We now claim that this computation can be simulated 
efficiently classically, using the above variant of theorem [l] To see this, first note that \a) is CT 
in B. Second, O := W Z X U is ECS in B: to show this, note that H® n OH® n = M^X\M. Here, as 
before, M := ff®™LT 2 iI® n is a poly-size circuit consisting entirely of CNOT and e l9Z gates, and X\ 
denotes the Pauli X operation acting on the first qubit. The operation M< X\M is basis-preserving 
in the computational basis, hence O = H® n [M^ XiM]H® n is basis-preserving in B. This proves 
the claim; note that we have hence proved: 

Corollary 5 Every poly-size circuit composed of CNOT and e t0X gates (for arbitrary real 9), 
acting on an arbitrary product input and followed by measurement of Z\ , can be simulated efficiently 
classically. 

6 Simulating quantum algorithms 

In this section we apply our results in the context of quantum algorithms. The idea is to consider 
e.g. theorems [T] and [3] and corollary [2] as a collection of 'tests' that every quantum algorithm 
claiming to achieve an exponential speed-up needs to pass. We will consider the three classes of 
algorithms mentioned in the introduction. 

6.1 Potts models 

Here we point out that a recently proposed quantum algorithm |19) . concerned with estimating 
partition functions of classical spin systems such as the Potts model, can be simulated efficiently 
classically. Letting Z denote the Potts model partition function defined on some (arbitrary) lattice, 
the quantum algorithm in |19j provides a polynomial approximation of the quantity Z/A. Here A 
denotes a particular, easy-to-compute normalization factor that depends on the couplings of the 
model (see [15], Cor. 5.9, for the precise form of A); A is sometimes called the 'approximation 
scale' of the algorithm. On the other hand, in [3T] mappings were established which allow to 
express the same quantity Z/A as the overlap between a suitable product state \a) and stabilizer 
state \tp): Z/A = (a\ip). Note that both stabilizer states and product states are CT (see section 
[4|. Using theorem |3] (in fact: the special instance A = I of lemma [3j dealing with overlaps between 
CT states) , we find that overlaps between stabilizer states and product states can also efficiently be 
estimated with polynomial accuracy with classical methods. Hence, the quantity Z/A can also be 
estimated with polynomial accuracy in poly-time using classical means, showing that the quantum 
algorithm in question can be simulated efficiently classically. 

We emphasize that the work [19] contains several quantum algorithms besides the partition 
function algorithm focused on here (in particular, the latter does not constitute the main result of 
[T5]), including algorithms for BQP-complete problems, to which our classical simulation techniques 
do not apply. 

6.2 Deutsch-Jozsa 

An application of corollary [2] is found by considering the Deutsch-Jozsa (DJ) algorithm [SU]. Recall 
that in the DJ problem one considers a black-box function / : {0, 1}™ — > {0, 1} which is promised to 
be either constant or balanced [32] • The task is to determine which possibility holds. Classically, 
any deterministic solution to the problem requires exponentially many oracle calls, whereas a 
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randomized classical algorithm can solve the DJ problem with exponentially small probability of 
failure using O(n) queries. The DJ quantum algorithm constitutes a deterministic solution to the 
problem using a single query of the oracle. 

Thus, it is well known that DJ can be simulated classically when an exponentially small proba- 
bility of failure is allowed. Here we will reproduce this result, showing that it immediately follows 
from corollary [2j Moreover, we will find that a large class of generalizations (to be specified below) 
can be efficiently simulated as well. The argument is very general and mainly regards the structure 
of the involved circuits. 

Going through the steps in the DJ algorithm, it is easily verified that DJ is implemented by a 
circuit belonging to the following general class (the system is initialized in the state |0} n ): 

Round 1: apply a local unitary operation V±; 

Round 2: apply an ECS operation V2; 

Round 3: apply another local unitary operation V3 ; 

Round 4: measure the observable O = |0)(0| fe ® /, for some k < n. 

Using corollary [5J we now immediately find that such a computation can be simulated efficiently 
classically. Indeed, the state obtained after Round 1 is a a product state and hence CT. Moreover, 
the operation in round 2 is efficiently computable sparse. Finally, the observable O' := V3OV3 
has the form I7X7I ® I for some fc-qubit product — and hence CT — state I7). Corollary [2] can now 
immediately be applied. 

Note that, in the argument, the specific form of the function / (computed in Round 2) is 
completely irrelevant. This shows that the lack in computational power of the DJ algorithm is a 
structural feature of the circuit. In particular, this computational weakness cannot be overcome 
by e.g. changing the form of the oracle, but must involve a more drastic alteration of the circuit 
structure. 

6.3 Simon's algorithm 

Lastly, we consider Simon's algorithm |17j . As this algorithm has the admirable feature of being a 
very simple quantum algorithm that nevertheless achieves an exponential speed-up, it is an ideal 
candidate to compare quantum and classical computational power. Simon's algorithm is worth 
investigating from a number of angles. As a comprehensive study would lead us too far, here we 
single out one particular aspect, namely the surpising role of the round of classical postprocessing 
in the algorithm taking place after the measurement. We will show that this seemingly innocuous 
round of classical computation plays a rather determining role in the performance of the algorithm. 

We first give a short review of Simon's algorithm in section |6.3.1| In section |6.3.2| we take 
small detour, discussing aspects of Fourier analysis of Boolean functions, which will be necessary 
to prove theorem [2] the latter is done in section [6. 3. 3| 

6.3.1 Review of Simon's algorithm 

Here we will focus on a decision problem version of Simon's problem, where it is asked to determine 
the i-th bit of the unknown string a for some i. We will fix i = 1 in the following for concreteness. 

Simon's quantum algorithm consists of the following steps. There are two registers, each con- 
sisting of n qubits, each initially prepared in the state |0) n . First a Hadamard operation is applied 
to every qubit in the first register. Second, the oracle operator Ut is applied, yielding |x)|/(x)). 
Third, again a Hadamard operation is applied to every qubit in the first register. This yields a state 
of the form |V>out) <x X)«ev \ u ) ) • Here the sum is over all n-bit strings u that are orthogonal to a 
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(w.r.t. modulo-2 arithmetic). We denote by V the subspace over Z 2 of all such u. The \tp u ) are (ir- 
relevant) normalized states. Next, all qubits in the first register are measured in the computational 
basis, yielding a bit string u which is drawn uniformly at random from the subspace V. Running 
this procedure TV times, one generates the (iVn)-qubit state IVw)^ and one subsequently obtains 
TV bit strings u 1 , . . . , u N , each drawn randomly from V. We assemble these vectors as the rows as 
aniVxri matrix, denoted by u. If N — 0(n) then the probability that u l , . . . , u N do not span the 
entire space V is exponentially small in n. In the final step in the algorithm, one uses a classical 
computer to compute a solution x to the linear system of equations ux — 0. More precisely, in the 
decision problem version of Simon's algorithm, a function g : {0, l} nAr — >■ {0, 1} is computed which 
takes the entries of the matrix u as input and which outputs 1 if there exists a solution x where 
the first bit of x is equal to 1; the output is zero otherwise. Note that g is efficiently computable 
classically. If the matrix u has rank n — 1 — which happens in all cases except for an exponentially 
small fraction — then there is a unique nontrivial solution i.e. x — a, in which case the function 
g(u) correctly outputs the first bit of a. 

In summary, Simon's algorithm can be implemented with an (TVn)-qubit circuit (where Nn = 
poly(n)) displaying the following structure; the circuit acts on the all-zeroes computational basis 
state. 

Round 1: apply a Hadamard gate to some subset of qubits; 

Round 2: apply an efficiently computable basis-preserving unitary operation; 

Round 3: apply another round of Hadamard gates to some subset of the qubits; the latter 
subset is denoted by 5 1 ; 

Round 4: perform a computational basis measurement on all qubits in S. Denote by u the 
bit string containing all measurement outcomes. 

Round 5: classically compute the value g(u) — which represents the output of the algorithm — 
where g is some efficiently computable Boolean function. 

For the time being, we will consider the above class of 5-round circuits in full generality, and ignore 
the specific forms of e.g. the functions / and g needed in Simon's algorithm. 

6.3.2 Intermezzo: learning theory 

In order to formally state and prove theorem [2| beforehand we briefly need to discuss some ele- 
mentary concepts related to learning theory of Boolean functions (see e.g. [33 j ) . Readers familiar 
with these concepts may immediately skip to section [6. 3. 3| 

1. A Boolean function is any function g : {0, l} m — > {0, 1}. Every Boolean function can be 
written in a unique way as a multivariate polynomial g(x) = ^2 s as xS over Z 2 . In this 
expression, the sum ranges over all subsets S C {1, . . . ,m}. Moreover one has 0,5 € Z 2 and 
x s := Yiies Xi f° r ever y S, and arithmetic is performed over Z 2 . The (Z2-)degree of g is the 
size of the largest set S such that as = 1. 

2. The Fourier transform g : {0, l} m — > K of g is defined as follows: 



for every m-bit string u. The quantities g(u) are called the Fourier coefficients of g. If the 
function g is computable in poly-time (or provided as an oracle), and if a bit string u is pro- 
vided as an input, then there exists an elementary poly-time classical algorithm to estimate 




(14) 



X 
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the quantity 2 m g{u) with polynomial accuracy. To see this, simply note that 2 m g{u) co- 
incides with the expectation value of the (efficiently computable) function x — > (— iy( x ) +u x 
w.r.t. the uniform distribution, such that a polynomial approximation of 2~ m g(u) can be 
achieved in poly-time due to the Chernoff-Hocffding bound. 

3. A Boolean function is said to be s-sparse if it has precisely s nonzero Fourier coefficients. It 
is easily verified that every linear function is 1-sparse. Also, it has been shown that every 
Boolean function corresponding to a polynomial of degree d is at least 2 d -sparse [31] . In this 
sense the sparseness of a Boolean function is an indication of its nonlinearity, since high- 
degree polynomials necessarily have many nonzero Fourier coefficients [35 . A (family of) 
function(s) g is simply called 'sparse' if its sparseness satisfies s < poly(m). 

4. Interestingly, there exists an efficient algorithm to determine all Fourier coefficients of g that 
are greater than a given threshold value, in the following sense: 

Lemma 4 J36f Suppose that one has access to an oracle computing a Boolean function g. 
Let p(m) denote an arbitrary polynomial in m. Then there exists a poly-time algorithm that 
outputs a collection of m-bit strings T C {0,1}™ of size poly(m) containing all u such that 
2- m \g(u)\>(p(m))-\ 

Together with the remark made in 2, it follows that there exists a poly-time algorithm that 
outputs the set T together with polynomial approximations of all the quantities 2~ m g(u), for 
every u G T. Note that lemma [4] is a nontrivial result: indeed, a priori it is not obvious that 
the coefficients g(u) that lie above a certain threshold can be determined efficiently, since in 
principle there is an exponentially large space of bit strings u to be searched. 

6.3.3 Proof of theorem 2 

We are now in a position to formally state theorem 2: 

Theorem 2 Consider a quantum circuit displaying the 5-round structure as in section \6.3.1\ If 
the function g computed in the round of classical postprocessing is promised to be sparse, then the 
entire circuit can be simulated efficiently classically, independent of the specific forms of the other 
rounds. 

An important ingredient in the proof of theorem 2 will be the m-qubit operator W g (where m 
denotes the number of bits on which g acts) defined by 

(u\W g \v) = 2- m g(u + v) for every u,v E {0, 1}"\ (15) 

Note that each row and each column of W g contains precisely s non-zero entries, where s is the 
sparseness of g; in other words, the Boolean sparseness of g and the sparseness of the operator 
W g coincide. This correspondence prompts the question of when the operator W g is efficiently 
computable sparse. It can easily be seen that W g is ECS if and only if (i) g is sparse and (ii) there 
exists an efficient algorithm to determine all those strings u such that g(u) ^= and the values of 
the corresponding coefficients g(u). Note however, that finding all u such that g(u) =t is highly 
nontrivial since some of the non-zero Fourier coefficients may be exponentially small, yet nonzero. 
Moreover, for general (efficiently computable) g the problem of computing g(u) with exponential 
precision is #P-hard. Therefore, requiring W g to be ECS is highly stringent. 

Fortunately, for our purposes the relevant question will be when W g can be well-approximated 
by an ECS operation A with polynomial accuracy; moreover, A itself need not be ECS in the exact 
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sense, but poly-ECS as discussed in the remark below theorem [3] — these are much less stringent 
demands. The problem of approximating W g by such an A is actually possible for every sparse 
function g. This is shown in the following lemma; the proof relies on lemma [4] 

Lemma 5 Let g be a sparse Boolean function acting on m bits that is provided as an oracle, let the 



operator W g be defined as in (15) and let p(m) be an arbitrary polynomial. Then there exists a poly- 



time classical algorithm that outputs a poly-ECS m-qubit operation A such that \\W g —A\\ < p(m) . 

Proof: Let s < poly(m) denote the sparseness of g. Let 9 > and let W g denote the matrix 
obtained by replacing all entries of W g that are smaller in absolute value than 9, by zero. That 
is: (tt|W^|«) is equal to 2~ m g(u + v) if \2~ m g(u + v)\ > 9, and zero otherwise. For now, 9 is 
arbitrary but below we will choose 9 to be a suitable polynomial in m. Since W g is s-sparse, the 
matrices W g and W g — Wg are s-sparse as well. Due to lemma|4]and the remark below it, for every 
9 = l/poly(m), the operator W g is poly-ECS. Next we show that 9 can be tuned appropriately 
such \\W g — W g \\ < is satisfied. To do so, let || • || r (|| • || c ) denote the maximum row 

(column) sum norm [37] : these norms are related to the spectral norm || • || via the inequality 
II -^11 2 < ll-^1lr||^||e f° r every matrix X [35]. As the matrix W — W g is s-sparse and as every entry 
of this matrix is at most 9 in absolute value, it holds that ||W — W g \\ r < s9 and \\W — W g \\ c < s9, 
and hence 

\\W W e g f < \\W - W e g \\ r \\W - W 6 g Wc < {s6f. (16) 

By choosing 9 :— (sp(m)) -1 and setting A :— Wg with this choice of 9, we have found a matrix A 
satisfying the desired conditions. This completes the proof. □ 

Lemma [5] will be the key ingredient in the proof of theorem 2, which is provided next. 

Proof of theorem 2: The analysis will be simplified by considering a slightly alternative version 
of the 5-round circuits in question, where now the entire computation is performed coherently 
and there is only a single measurement at the end of the computation. To achieve this, first one 
goes through rounds 1-3 as indicated. Second, the function u — > g(u) is computed coherently 
on the relevant registers, realized by a unitary operation U g mapping U g : |u) — > |g(u))|£ u ) for 
some (irrelevant) states |£ u ) [39] . Finally, the first qubit is measured in the computational basis. 
The overall circuit is denoted by Ut- Letting g be an arbitrary sparse function, we thus have 
to show that there exists an efficient classical algorithm to approximate (Zi) = (0\U^ZiUt\0) 
(where |0) = 1 00 . . .)) with polynomial accuracy. For further reference, we denote by \tp2) the state 
obtained after round 2; furthermore, % denotes the tensor product of Hadamard gates applied in 
round 3. Moreover, let p(n) denote an arbitrary polynomial in n. 

First, remark that the state \ip 2 ) is CT. Denoting O := HU^ZiUgH, one has (Zi) = (ip2\0\tp2) ■ 



It is now crucial to note that O — W g , where W g is defined in Eq. ( 15); this identity can easily be 
verified. This allows us to invoke lemma[5] yielding in poly-time a poly-ECS operation A satisfying 
|| W s — A\\ < p(n) _1 . Since A is poly-ECS and since \ip 2 ) is CT, according to theorem [3] (cf. also 
the remark below it) it is possible to approximate ('4'2\A\ip2} with polynomial accuracy in poly- 
time with classical means. In particular, it is possible to efficiently generate a number c such that 
\c - (ip 2 \A\iP 2 )\ ^pin)- 1 . Since (Z x ) = (V> 2 1 W g \ih), we then have 

\c-(Zt)\ < \c-(ib 2 \A\ib 2 )\ + \(ip 2 \(W g -A)\^ 2 )\ 

< p^i)- 1 + \\W g -A\\ < 2p(n)- 1 . (17) 

In the first inequality we have used the triangle inequality; in the second inequality we have 
used that K^KlFg — ^4)|t/j 2 )| is not greater ||W S — A||; in the third inequality, we have used that 
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\\W g — A\\ < p(n) . This hence shows that a polynomial approximation of (Zi) can be achieved 
in poly-time, thus proving the claim. □ 

We now specialize the discussion to Simon's algorithm. Note that the classical postprocessing in 
this algorithm is particularly simple, as it merely involves solving a system of linear equations over 
Z 2 . Nevertheless, the function g needed in Simon's algorithm is highly non-sparse. The intuition 
of the argument is that the function <?(u) is related to the computation of the determinant of a 
suitable matrix (or an analogous function in the case of non-square matrices), since the function 
g decides whether there exists a nontrivial solution to a certain system of linear equations. It is 
known that the determinant function X — > det(X) corresponds to a polynomial of degree k in 
the case of k x k matrices X, i.e. the degree of the polynomial is the square root of the input 
size k 2 of the determinant function. As the degree of a polynomial provides a lower bound to the 



logarithm of the sparseness (see point 3 in section 6.3.2), it follows that the determinant function 
has exponentially high sparseness s > 2 k . An analogous argument can be used to show that the 
function g considered in Simon's algorithm has high sparseness parameter s. 

Looking at the problem differently, one can in fact use the proved 0(2 2 ) classical oracle lower 
bound for Simon's problem to immediately infer that the function g cannot be sparse. Indeed, if g 
were sparse then our classical simulation results would imply the existence of a classical algorithm 
to solve Simon's problem using poly(n) classical oracle queries, which is provably not possible. 
Note that it is remarkable that the classical query lower bound for the oracle / can hence be used 
to infer properties of another function g\ 



7 Matchgates and poly-time classical computation 

We conclude this paper with a result regarding the computational power of matchgate circuits. 
While seemingly disconnected from the rest of the paper, this result will actually follow from our 
discussion of Simon's algorithm. 

Call a family of functions /„ : {0, 1}™ — > {0, 1} efficiently matchgate- computable if there exists 
a family of nearest-neighbor matchgate-circuits U n acting on M n — poly(rt) qubits (n = 1, 2, . . .), 
such that U n , acting on |x)|0) M_ " and followed by a {|0),|1)} measurement on the first qubit, 
yields the output f(x) with probability p > 2/3, for all n-bit strings x. Here U n is to depend only 
on the input size n and not on the entire input x (this aspect is important, as will be highlighted 
in the proof of theorem 2). Moreover, the circuit family is to be poly-time uniformly generated in 
the sense that the description of U n is to be poly-time computable from the number n. Our result 
is the following. 

Theorem 4 There exist functions that are efficiently computable classically (i.e. functions in P) 
that are not efficiently matchgate- computable. 

An interesting feature of this result is its proof method. Surprisingly, the proof will follow from 
our analysis of Simon's algorithm — even though the latter seems to have nothing to do with 
matchgates! Roughly speaking, we will show that if theorem [4] were false, then there would exist 
a quantum circuit to solve Simon's problem that can be simulated classically with our methods — 
hence resulting in a classical algorithm for Simon's problem that requires only poly(n) queries to 
the oracle. As the latter has been proved to be an impossibility, this will show that theorem [4] has 
to be true. 

In the proof of theorem [4] we will need the following simple application of corollary [4] 

Fact 1: Consider an n-qubit quantum circuit V = V4V3V2V1 where both V\ and V3 represent 
collections of Hadamards applied to subsets of the qubits, V2 is efficiently computable basis- 
preserving, and V4 is a poly-size (nearest- neighbor) matchgate circuit. Then any such circuit 



22 



(acting on |0) n and followed by measurement of Z\) can be simulated efficiently classically due 
to corollary|4j taking V 2 V 1 = U x and V 4 V 3 = U 2 . Indeed, V^ilO)' 1 is CT and {V A V^ Z x {ViV s ) 
is a linear combination of poly(n) Pauli products and hence ECS. 

Proof of theorem^ Consider the following variant g of the function g computed in the classical 
postprocessing in Simon's algorithm: g takes an N xn matrix u together with an integer i between 
and n (specified in terms of log n bits) as its inputs, and outputs 1 if and only if there exists 
a bit string x = (xi, . . . , x n ) satisfying ux — and Xi = 1. Note that g is efficiently computable 
classically. We claim that g is not efficiently matchgate-computable. To prove this, we show that 
the converse leads to a contradiction. Suppose that g is matchgate-computable and let U denote the 
(family of) matchgate circuit (s) that computes g. Now consider the following quantum algorithm 
A: first prepare the state \i) <g> IVw)®^ where TV = 0(n) and where |Vw) oc J2 ueV \u)\ip u ) as 
in section 6.3.1[ up to a permutation of the qubits, at this point the state of the quantum register 



has the form ^ u |i)|u)|x u ) for some (irrelevant) normalized |xu), and where the sum is over all 
N x n matrices u for which each row belongs to V. Second, apply the matchgate circuit U on the 
relevant registers in order to compute \i,u) —> \g(i,u)} in superposition — note that at this point 
it is crucial that U only depends on the input size but not on the entire input. Finally, measure 
Z\ and let {Z±) denote the expectation value of Z\. If the i-th bit of the unknown string a in 
Simon's problem is equal to 0, then (Zi) lies exponentially close to 1/3; if this bit is 1 then (Z\) 
lies exponentially close to —1/3. It is now easily verified that the algorithm A is implemented with 
a circuit displaying the structure considered in Fact 1. Hence a polynomial approximation of (Z\) 
can be classically achieved in poly-time with exponentially small probability of failure, for every 
i. Note that such an approximation allows to decide whether (Z\) lies exponentially close to 1/3 
or —1/3. This hence leads to a poly-time classical algorithm to determine a. This comprises a 
contradiction, given the 0(22") classical query lower bound for Simon's problem. Hence, g cannot 
be efficiently matchgate-computable. □ 
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A Appendix: Sampling and the Chernoff-Hoeffding bound 

The Chernoff-Hoeffding bound is a tool to assess with which precision the expectation value of 
a random variable may be approximated in terms of 'sample averages'. This bound asserts the 
following. Let X\, . . . X^ be i.i.d. real-valued random variables with E := EJQ and Xi G [—1, 1] 
for every i = 1, . . . , K. Then 



Prob • 



1 K 



E 



< e > > 1 - 2e" 



(18) 



In the case of complex- valued random variables Xi, a similar bound can be obtained for \X^\ < 1 
by splitting Xi in its real and imaginary part and using ( 18 ) on both of these parts. In this 
work we will consider the Chernoff-Hoeffding bound in the following context. Let V := {p x } be 
a probability distribution on the set of n-bit strings x € {0, 1}™ and let x —> F(x) e C be a 
complex function such that |-F(x)| < 1 for every x. Let (F) = ^2 x p x F(x) denote the expectation 
value of F. The goal is to approximate (F) by sampling from the distribution V . To do so, 
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consider K n-bit strings x 1 , . . . ,x drawn (independently) from the distribution V, and denote 
the average a :— K^ 1 J2i=i F(x l ). The Chernoff-Hoefiding bound then implies the following. For 
every e = p{n)~ l , where p(n) represents an arbitrary polynomial in n, there exists a K that scales 
at most polynomially with n, such that the inequality \a — (F)\ < e holds with a probability that 
is exponentially (in n) close to 1. In other words, by taking poly(n) samples x % it is possible to 
estimate (F) with an error that scales as p(n) -1 for every choice of p(n). We will henceforth 
denote this type of estimate as an approximation with 'polynomial accuracy' or a 'polynomial 
approximation'. Note that a polynomial approximation achieves an estimate of (F) up to O(logn) 
significant bits. 

Moreover, if the function F can be evaluated in poly-time and if it is possible to sample in poly- 
time from V . then the quantity a can be computed in poly-time. Hence, an overall efficient method 
is achieved to compute a polynomial approximation of (F) with exponentially small probability 
of failure. In this paper we will mostly ignore the fact that the Chernoff-Hocffding bound yields 
polynomial approximations that do not succeed with unit probability but rather with a probability 
that is exponentially close to one. When the notion of a polynomial approximation is considered 
in the text, we will mean a polynomial approximation that is achieved with a probability that is 
exponentially close to one. 

We discuss two immediate generalizations of the above arguments. First, above we have re- 
quired that the function F can be evaluated with perfect precision in poly-time. Such perfect 
accuracy is in this context not necessary. In particular, with similar methods as above, a polyno- 
mial approximation of (F) can be achieved in poly-time if F(x) itself can be approximated with 
polynomial accuracy in poly-time. This can be seen as follows. Suppose that, on input of an 
arbitrary x, a polynomial approximation of F(x) can be achieved in poly-time. Let p(n) be an 
arbitrary polynomial and consider K n-bit strings x 1 , . . . ,x drawn from the distribution V as 
before. Then for large enough K (where K scales as a polynomial in n with suitably high degree), 
K^ 1 Yli=i F(x l ) lies e-close to (F), where e = (2p(n))^ 1 . As each of the K quantities F(x t ) can 
be approximated with polynomial accuracy in poly-time by assumption, it is possible to efficiently 
generate K complex numbers c % (i = 1,...,K) such that \c % — F(x l )\ < (2p(n))~ 1 . Using the 
triangle inequality and denoting c := K^ 1 £V_j c\ it then easily follows that \(F) — c| < p(n)^ 1 . 

Second, so far we have considered functions F satisfying ||F|| := max^ |-F(a;)| < 1. Note that 
similar conclusions can be reached for functions satisfying ||F|| < poly(n). 

The discussion in the present section can be summarized as follows. 

Theorem 5 (Chernoff-Hoeffding bound) Suppose that it is possible to sample in poly-time 
with classical means from a probability distribution {p x } on the set of n-bit strings. Let F : 
{0, 1}™ — > C denote a function satisfying \\F\\ < poly(n). Moreover, suppose that it is possible to 
efficiently estimate x — > F(x) with polynomial accuracy on a classical computer. Then there exists 
an efficient classical algorithm to estimate (F) with polynomial accuracy. 
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